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A  BAYES  MODEL  IN  SEQUENTIAL  DESIGN 

\  Samuel  Karlin  and  S.  M.  Johnson 

Introduction 

This  paper  Is  concerned  with  the  Bayes  problem  of  how  to 
maximize  the  expected  number  of  successes  In  N  trials  when  at 
each  trial  we  are  free  to  choose  between  two  machines  I  and  II 
whose  prob.blUtl..'£W  success  are  unknown  but  have  a 

A  A  ,4*  ' 

known  a  priori  distribution  *0  •  A- - - 

/y  A  v 

We  have  adopted  the  use  of  the  terms  machine  I  and  II  to 
expedite  the  discussion.  Many  other  Interpretations  and  applica¬ 
tions  can  be  found  for  the  theory  developed  below.  This  is  a 
type  of  problem  classified  as  sequential  design.  No  nontrivial 
examples  In  this  field  have  been  analyzed  as  far  as  we  know  to 
the  present  date,  and  this  represents  an  attempt  to  study  some 
models,  to  develop  some  qualitative  results,  and  to  focus  atten¬ 
tion  on  some  of  the  difficulties  involved  by  suitable  examples. 

One  particular  model  is  analyzed  completely  In  hp.  It  is  Interest¬ 
ing  to  note  that  none  of  the  Intuitive  simple  strategies  are 
usually  optimal  and,  moreover,  that  the  optimal  strategies  in 
general  seem  to  be  of  a  very  complicated  nature.  However, 
approximate  optimal  strategies  are  discussed  In  several  contexts. 

In  §2  we  have  analyzed  the  relevance  of  the  strategy  Sc 
which  employs  at  each  stage  the  machine  with  the  maximum  a  priori 
expected  value.  It  is  shown  that  this  strategy  Is  rarely  optimal. 
Other  features  suspected  about  optimal  strategies  are  exploded. 

§3  deals  with  the  case  where  one  machine  has  a  known  probability 
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of  success  while  the  other  has  only  a  known  a  priori  distribution. 
This  case  ia  handled  completely  and  serves  to  illuminate  the  complex 
nature  of  optimal  strategies.  §4  treats  of  certain  game  extensions 
associated  with  the  Bayes  problem. 

§1 .  The  Gteneral  Formulation 

Let  S  denote  any  strategy  for  choosing  between  the  two 
machines  and  let  VQ(^,  cr)  denote  the  expected  number  of  successes 
following  strategy  S  for  given  cr).  Then  the  expected  number 
of  successes  based  on  policy  S  is 

(1)  ls(F)  -  /  /  Vs(/o,  a)dF(^,  o-)  . 

The  best  procedure  is  the  one  maximizing  $,(F).  Since  N  is 
finite,  the  maximum  is  well  defined. 

In  computing  i3(F)  one  can  extract  the  following  formal 
procedure,  We  determine  the  conditional  a  priori  distribution 
of  (p,  or)  on  the  k— th  trial,  given  that  s^  successes  and  ^ 
failures  from  I,  and  s2  successes  and  fg  failures  from  II,  with 
a1  ♦  f^  +  s2  +  f2  »  k  —  1,  have  preceded.  In  fact, 

Prob.  of  success  (p,  cr  j  s^f^a^fg) 

Pr(s1,f1,s2,fg|p,c)Pr(p,a) 

Pr( 


We  thus  obtain  as  the  a  posteriori  distribution 


\ 
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(2)  dF^I^Vgfg)  - 

r  r  ~  l/i  -\  1_*2 /,  _\  2  _ 


/  if 

O  O 


(!-/•)  V^d-cr)  *dF (f,cr) 


The  contribution  to  |^(F)  then  becomes  the  first  f  or  o' moment 
of  the  distribution  (2)  according  as  to  whether  I  or  II  Is  used 
at  the  k— th  stage.  This  a  posteriori  distribution  (2)  Is  Inde¬ 
pendent  of  the  order  of  presentation  of  the  information  as  can 
be  easily  verified. 

One  example  of  a  very  natural  strategy  S  is  the  principle: 
maximize  expected  value  at  each  stage.  Precisely,  the  quantities 
/  ^dF(^,o")  and  /  ottF(^,o')  are  compared  and  machine  I  is  chosen 
over  machine  II  depending  on  whether  the  first  integral  exceeds 
the  second.  The  outcome  of  the  first  trial  then  determines  an 
a  posteriori  distribution  F*  for  which  the  same  criterion  on  the 
first  moments  of  F*  indicates  the  machine  to  be  played  for  the 
second  step,  etc.  This  particular  strategy  we  shall  call  the 
"stagewlse  maximization  principle"  and  designate  It  by  SQ.  In 
the  following  simple  example,  SQ  is  optimal. 

Example  1.  If  f  +  cr -  1,  then  F(^,  or)  is  of  the  form 
F(f,  1  -  f).  Thus  a  success  or  failure  on  I  Is  equivalent  to 
(gives  the  same  information  as)  a  failure  or  success  on  II, 
respectively. 

s,  f,  a0  f0 

Let  X  »  {  1(1  -  f>)  1  <r  ^(1  -  or)  d  and  write  dF  for  dF(^»,  o'). 
Let  Ek(X)  be  the  maximum  expected  number  of  successes  when  playing 
optimally  for  k  more  trials  giveu  the  history  indicated  by  X. 

Then 
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/XfdF  >(/XfdF)Ek__1(Xf)  +  (/(l-f^XdFjE^UU-p)) 
>  /XcdF  +(/XodF)Ek_1(Xa)  ♦  (/XU-crJdFjE^fXU-c)) 


% 


If  and  only  If  /  X^dF  >  /  X<xlF  as  or  -  1  -  f>  ;  that  la,  S0  is 
tha  optimal  policy. 

|2.  Qualitative  Results  about  Optimal  Procedures 

Our  first  task  la  to  obtain  the  complete  procedure  for 
N  »  2  when  ?(f,  &)  »  F(f)3(a).  It  Is  Important  to  emphasize 
that  if  the  number  of  moves  Is  n,  then  only  strategies  which 
are  functions  of  the  first  n  moments  ••*,  un  of  F  and 
^1*  ^2#  •••,  G  need  to  be  considered.  This  is  a  conse¬ 

quence  of  the  fact  that  the  expected  yield  for  any  given  strategy 
la  an  expression  involving  at  most  these  moments.  Thus  all 
strategies  describing  a  first  move  can  be  viewed  as  functions 
•••,  un,  Ui#  •**,  such  that  If  S1(u1,  ••*,  un» 

•••,  u^)  >  0,  then  I  is  chosen  at  the  first  stage  and  II  In 

the  contrary  case.  Let  m-i  •  /  ^dF(p)  and  =  /  o^dO(o’). 

A  c  |  o 

Suppose  for  definiteness,  we  determine  necessary  and 

sufficient  conditions  that  I  Is  employed  first  when  N  =■  2. 

Using  the  fact  that  at  the  last  step  one  maximizes  expected 
value,  we  secure  In  this  circumstance  the  value 


1  -  *1 
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We  now  obtain  the  beat  possible  value  attainable  if  machine  II 
is  used  first.  The  strategy  of  playing  II  at  the  first  step 
and  then  changing  regardless  of  the  outcome  is  dominated  by  (1). 

•  i 

Indeed,  since  ^  >  ^»  (!)  13  greater  than  or  equal  to  2^  >  ^ 
the  value  obtained  according  to  the  above  strategy. 

Consequently,  u-2  +  ^  ~  13  the  on^  attainable 

value  needing  to  be  considered  if  one  begins  with  II.  Choosing 
each  term  of  the  maximum  of  (1)  yields  the  two  inequalities 

I 

M»2  ^2 

(2) 

i  i  i 

M-l  +  ^  +  M-2  • 

Combining  and  rewriting  in  a  symmetric  form,  we  have 

Lemma  1.  If  N  -  2  and  the  machines  have  Independent  a  priori 
distributions  of  probabilities  of  success  with  moments  and 
then  a  necessary  and  sufficient  condition  that  machine  II 
is  played  first  is  that 

Max(u>2  -  ^  >  Max(u2  -  **1  “  • 

The  next  theorem  shows  that  SQ  is  generally  not  optimal. 

Theorem  1.  If  machines  I  and  II  have  a  priori  distributions 

Fir)  -  ;V(  t)dt  and  0(a)  »  t)dt  respectively  for  the 

o  o 


P-328 

6 


probabilities  of  success  where  t )  and  ^ ( t )  are  continuous 
and  positive  for  0  <  t  <  1,  then  there  exists  an  n  so  that  for 
n  trials  the  optimal  procedure  does  not  agree  with  the  strategy 
Sc  described  by  stage wise  maximization. 

Proof :  (By  contradiction)  Suppose  for  definiteness  that 
1  1 

1  >  /  t^(t)dt  =»  b  >  a  »  /  i£(t)dt  ;  0,  then  clearly  at  the  first 
o  o' 

trial  we  use  I.  According  to  the  strategy  Sc,  It  Is  clear  by 
the  Schwartz  Inequality  that  we  stay  with  the  machine  being  used 
whenever  success  occurs.  It  Is  easily  shown  in  view  of  the 
hypothesis  on  ^(t)  that  if  tc,  then 

/1tr+1(l-t)3^(t)dt 

“ — I - »  * 

/  tr(l-t)3*(t)dt 
o 

This  can  also  be  obtained  as  a  consequence  of  the  law  of  large 
numbers  where  the  frequency  of  success  tends  to  tQ. 

We  choose  r,  s  — *  co  sufficiently  large  and  |  —  a  |  <  € 

so  that 


a  + 


”  /tr(l-t)3*(t)dt 


1 

/  tj^(t)dt,  and 
o 


a  +  € 


,  I  >  a 

"  /tr+1(i-t)3i(t)dt 


(3) 
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The  approximation  from  above  la  easy  to  Insure  by  approximating 
to  a  +  t/2.  Furthermore,  €  la  chosen  sufficiently  small  so 
that  the  above  holds  and  also  /  t2^(t)dt  >  (/  t^(t)dt) 

Let  n  =  r  +  s  +  2.  Suppose  that  using  I  first  resulted  in  r 
consecutive  successes  and  then  s  failures.  This  agrees  with  the 
procedure  prescribed  by  strategy  and  this  situation  occurs 
with  positive  probability.  At  the  r  +  s  +  1  step,  in  view  of  the 
first  equation  of  (3),  one  should  continue  with  I  according  to 
strategy  SQ  (maximization  stagewise).  We  now  show  that  both 
inequalities  of  (2)  are  violated  and  thus  machine  II  should  be 
used  to  furnish  an  optimal  return.  Indeed,  the  distributions 
of  successes  at  the  beginning  of  the  r  ♦  s  +  1  step  are,  in  this 
situation 

dF1^)  - 


Q(a)  -  /  dd*. 

On  account  of  (3) 

1  i  1 

/  (f)  >  S  crf[cr)der. 

o  o 


►r(i-v03i 


£  i»r( 


But 
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-J 


< 


/f"2 g-i>)V(f)dp 

if  (i-f)V(r)<jf>  ;pr+1d-p)s»((f)<if  i(F(i-ei3*{f)<ie 


s  (»  +  *)2  <  s2 


1 

♦  3€  < ; 

o 


t2y,(  t  )dt 


Also, 


/^dP1(p)  +  JfdF1^)  f*p(o)dcr  <  a  +  fc  +  a2  +  £ 

<  /  t^lt)dt  +  /  t2 )^(t)dt  . 

Hence,  following  SQ  we  arrive  at  a  nonoptimal  yield  and  the 
theorem  la  eatabliahed. 

We  further  remark  that  Theorem  1  can  be  established  for 
almost  all  pairs  of  independent  distributions.  Only  In  trivial 
oases  where  the  a  posteriori  distribution  of  I  for  any  possible 
outcomes  will  always  have  larger  expected  value  than  that  of  II, 
will  it  be  true  for  all  n  that  the  principle  of  "maximization 
stepwise"  agrees  with  the  optimal  procedure.  We  have  chosen  to 
illustrate  this  theorem  by  the  class  of  distributions  considered 
above  in  order  to  avoid  some  trivial  technical  difficulties. 

In  the  case  where  F(^)  =  0( c)  and  with  ?(f>)  symmetric,  l.e., 

1  _  p(i_^)  «  F(p),  then  Theorem  1  is  valid  in  many  instances 
with  n  -  4.  It  is  clearly  immaterial  which  is  played  first  as 
the  expected  yield  is  the  same  and  equal  to  1/2.  Furthermore, 
if  a  success  occurs  then  it  is  optimal  based  on  the  principle 
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of  5q  to  continue  with  the  same  choice.  We  suppose  now  that  a 

failure  occurs  on  the  second  trial;  then  there  remain  two  trials 

p(l-,)dF(f) 

with  distributions  -  and  dF(p),  respectively,  for  I 

/f(l-f)dF(f) 

and  II.  Both  are  still  symmetric  and  hence  possess  an  expected 
value  equal  to  1/2.  Thus,  according  to  the  strategy  which  maxi¬ 
mizes  stepwise.  It  makes  no  difference  which  machine  la  tried  at 
this  third  step.  This  Implies  according  to  Lemma  1  that 

M-g  -  -  or  ♦  M-g  •  Tfcis  Is  generally 

^  -  Lg 

impossible,  particularly,  e.g.,  when  dF(p)  =  Cfa(l  —  p)  d  p. 

The  strategy  SQ  can  be  described  as  the  procedure  which 
makes  that  choice  at  each  trial  which  would  have  been  optimal 
had  there  been  only  one  trial  left. 

Let  Tj  be  the  strategy  making  the  choice  at  each  stage 
which  would  have  been  optimal  If  there  were  J  trials  left  with 
the  understanding  that  when  fewer  than  J  trials  remain  then  the 
optimal  procedure  Is  followed  thereafter. 

Thus  Tj  *  SQ,  and  the  strategy  Tg  for  Independent  distribu¬ 
tions  is  determined  by  the  relations  of  Lemma  1. 

In  this  way  we  can  obtain  a  whole  hierarchy  of  strategies 
T j,  J  •  1,  2,  •••,  N.  Intuitively  one  might  expect  that  these 
strategies  are  successive  improvements.  Of  course,  tdien  N  »  2, 
then  Tg  >  T^  (Tg  is  indeed  optimal),  for  N  »  3(T^  >  T^,  Tg)  etc. 
We  now  produce  an  example  for  Independent  populations  which  shows 
that  for  N  =»  3,  Tx  »  Sc  >  Tg.  This  negates  and  destroys  the 


P-328 

10 


above  Intended  direction  for  improving  strategies.  To  this  end, 
suppose  u2,  p^  and  p.^,  represent  the  first  three 
moments  of  two  distributions  F(p)  and  G ( o-)  respectively,  and 


..  '  '  '  «it.  M’2“ii3 

that  p^  >  ^  <  P^  +  p-2» 

I  I 

M-2-^3 

and  p.^  >  i  i  .  In  view  of  these  inequalities,  according  to 


m-i-h2 


^2 


Tg  we  readily  obtain  for  N  =  3  the  expected  yield  p^  +  p.^  +  p.^  + 

( P'1  “^2 )  ^1  +  (1_h.)  R-2  +  raax  (  —  — ?  ,  -1-— 2)  *1  . 

1-^  l-^x  J 

In  a  similar  manner  we  can  get  the  expected  yield  following 
strategy  =»  SQ.  The  difference  becomes  ^  —  Tg  =  p^  —  p^  >  o. 

To  complete  the  example,  it  remains  only  to  construct  two  distribu¬ 
tions  with  moments  satisfying  the  above  inequalities.  Let 

*1*  a2*  a3  an<*  ^i*  ^ 2 '  ^3  ^eno^e  any  successive  moments  of  two 
distributions  H  and  K  where  ax  >  b1#  ag  <  b2  and  a?  >  b^; 

e.g. ,  ai“2‘*a2»j,a^  =  ^.,b1=»^.  —  7?,b2  =  j  +  7), 

which  for  7?  sufficiently  small  are  the  moments  of 
a  distribution  since  (a^a^a^)  is  an  interior  point  of  the 
moment  space  of  order  3*  Let  c^,  c2,  c^  denote  the  moments  of 


a  distribution  L  satisfying  c.  ^  ?  .  The  Schwartz  inequality 

c  -c  12 

implies  that  c±  >  -^1  .  Let  F(f)  -  6H(^)  +  (l-«)L(f)  and 
0(cr)  »  €K(c r)  ♦  (l-€)L(<r)  with  €  chosen  sufficiently  small. 


Then 


Hi  =  ^at  +  (l-C)^  and  p.|  =  €b1  ♦  (l-€)c1  . 
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We  Immediately  get  that  ^  ^  ,  u2  <  M-g  »  «nd  ^  ^  from 

the  correapondlng  propertlea  of  the  a.  and  b, .  Also  u,  >  . 

•  i  1,1  *1  ’  » 

^  ^2~^3  •  •  ^1^2  ^1 
>  i  4  »  M-i  >  -  ,  and  m_  >  -  follow  since 

Hj-^2  ^-U2 

I 

for  €  sufficiently  small  for  which  these  inequali¬ 
ties  hold. 

The  next  principle  we  examine  la  that  of  "staying  on  a 
winner."  This  principle  involves:  Does  the  optimal  strategy 
have  the  property  that  whenever  success  occurs  on  a  given  play 
of  8  machine  this  same  machine  is  tried  at  the  next  trial?  This 
Is  not  always  optimal  for  the  case  of  dependent  distributions 
F(^,o^.  Consider  the  following  example:  F(f,ef)  concentrates 
at  two  points,  (6,0),  with  probability  X  and  (1— C,l)  with  probability 
1  —  X.  With  €  -  .1  and  X  -  .8,  then  consider 

(*)  €2X  +  (l-t)2(l-x)  <  (i_C)(i_x) 

and 

(5)  6(1— 6)  X  +  ( 1—  €) 6(  1— X )  >  6(1— x)  . 

The  inequalities  yield  the  following:  If  machine  I  is  used  and 
success  results,  then  (4)  implies  that  II  is  to  be  played  next, 
while  if  failure  results,  then  (5)  requires  that  I  is  again  to 
be  used.  The  interpretation  becomes  that  if  success  results, 
then  it  is  highly  likely  that  the  sample  consists  of  machines 
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of  high  probability  of  success  and  hence  II  is  preferred.  A 
similar  meaning  is  attributed  to  the  situation  of  failure. 

On  the  other  hand,  if  one  chooses  II  first,  then  perfect  infor¬ 
mation  results,  and  on  the  basis  of  the  outcome  the  play  is 
evident  for  the  last  step. 

Computing  the  expected  value  starting  with  I  yields 

( 1—  €) ( 1— X )  [2  +  €]  +  €  X(2— €)  -  .53. 

The  expected  yield  beginning  with  II  gives  2(l-x)  ♦  X  €  »  .46. 
Consequently,  in  general,  the  "staying  on  a  winner"  principle 
does  not  apply.  However,  it  is  conjectured  that  when  the  machines 
come  from  Independent  populations,  then  this  principle  is  valid 
for  the  optimal  strategy. 

A  related  concept  is  the  property  of  "monotonicity,"  defined 
aa  follows.  Let  the  number  of  trials  be  fixed  and  let  the  a 
priori  distributions  be  F(^)  and  0(<r).  Suppose  that  it  is 
optimal  to  play  I  first.  Then  if  F  is  replaced  by  Fs  -  frdF 

JfdF 

with  0  unchanged,  suppose  it  is  still  true  that  I  is  preferred 
to  tl  at  the  first  step.  In  this  case,  we  say  that  the  optimal 
strategy  la  monotone.  It  is  trivial  to  show  that  using  the  same 
machine  is  equivalent  to  keeping  F  unchanged  but  decreasing  0 
to  0f. 

We  assume  in  what  follows  that  the  machines  are  from  inde¬ 
pendent  universes. 

Lemma  2.  If  the  optimal  strategy  for  any  number  of  trials  is 
monotone,  then  the  principle  of  "staying  on  a  winner"  is  valid. 
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Proof:  Suppose  that  it  is  definitely  better  to  play  I  first, 
but  if  success  results,  shift  to  II  on  the  next  trial.  By  the 
monotonocity  assumption  a  fortiori,  if  failure  results  on  the 
first  trial,  II  still  is  played  next.  But  playing  I,  then  II, 
and  optimizing  from  then  on,  is  equivalent  to  playing  II,  then  I, 
and  then  optimizing  from  then  on  since  the  order  of  the  first 
two  plays  does  not  affect  what  follows.  This  contradicts  the 
assumption  that  it  is  definitely  better  to  play  I  on  the  first 
trial,  and  the  lemma  is  established. 

We  note  that  to  prove  the  proposition  of  "staying  on  a 
winner"  for  N  trials  it  is  sufficient  to  know  the  monotonicity 
criteria  for  N  —  1  trials.  Using  lemma  1  and  lemma  2  we  now 
verify  the  "staying  on  a  winner"  principle  for  2  and  3  trials. 

It  is  trivial  for  N  -  2.  For  the  case  N  3,  It  is 
sufficient  to  show  monotonicity  for  N  -  2.  We  need  to  consider 
two  cases  where  I  is  preferred. 

Case  1. 

^  ^  and  either  ^  or  ^  ^  +  • 

If  ix^  is  replaced  by  ■  ,  then  clearly  any  of  the  inequalities 
valid  before  continue  to  hold. 

Case  2. 

1  i  ii 

^  >  Uj  but  u2  >  n2  and  ^  +  U2  >  We  first 

observe  that  the  last  inequality  implies  that  since  pJ  >  p.  t 

•  ^2  . 

M-2  >  or  ^  >  M>i  •  This  combined  with  ^  >  M-2  >  P-2  Insures 

by  lemma  1  that  machine  I  is  chosen  at  the  first  trial. 


I 
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The  general  monotonicity  property  for  independent  machines 
remains  an  open  question. 

The  laat  general  property  Investigated  is  whether  the  a  priori 
expected  value  is  monotone  Increasing  as  a  function  of  the  steps 
idien  employing  an  optimal  policy.  While  this  is  true  if  the 
strategy  is  Sc,  it  is  not  true  in  general. 

First  consider  the  following. 

Lemma  3.  The  strategy  SQ  applied  to  any  initial  distribution  F 
has  the  property  that  the  a  priori  expected  contribution  at  each 
stage  is  non-decreasing. 

Proof :  It  is  enough  to  prove  the  result  for  the  first  two 
stages.  Suppose  according  to  SQ  machine  I  is  used  first;  then 
the  expected  value  is  /pdF(p,cr).  Thus,  /pdF(p,c)  >  /  odF(^,er). 

If  Independent  of  the  outcome,  machine  I  is  employed  at  stage  2, 
then  the  outcome  is 

2 

+  j  (1.f)dp(  o)[7 1 .  jrittf.c) . 

Consequently,  if  the  machine  with  maximum  expected  value  is  used, 
the  total  expected  value  is  >  /  pdF 

In  contrast  to  this  result,  consider  the  case  of  N  -  3  with 

dF(f )  -  p  LV  Qf3  and  dG(w)  «  dcr  .  It  turns  out  that  for 

jr>(i-ey&p 

optimal  return,  machine  I  i3  preferred  first  with  expected  value 
for  the  first  move  equal  to  .6.  If  success  results,  then  I  is 
played  again,  while  if  failure  occurs  then  the  criteria  of  lemma  1 


i 
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require  II  to  be  chosen.  The  a  priori  expected  valve  for  the  aecond 
step  gives  ^  <  .6  -  ^  . 

Another  way  to  express  the  fact  is  that  if  we  let  the  random 
variable  xr(P)  represent  the  yield  at  the  rfc^  stage  according  to 
policy  P,  then  one  would  suspect  the  sequence  of  random  variables 
obtained  by  the  optimal  policy  would  form  a  semi -^nartingale.  The 
example  preaented  above  negates  this  proposition.  Let  xr(P),  as 
before,  denote  the  yield  at  the  rth  stage  according  to  the  policy  P. 
We  note  that  always 

Exp  (— — - - -)  <  /  /  max  (p,a)dF(p,cr)  . 

It  can  be  shown  using  the  law  of  large  numbers  that  if  S0  is 
modified  so  that  at  infinitely  many  trials  prescribed  in  advance 
of  density  zero  both  I  and  II  are  used  and  otherwise  the  usual 
criteria  of  SQ  are  employed,  then 

x, +  . . .+x 

lim  Exp  H  )  -  /  /  max  (p,cr)dF(p,a)  . 

This  is  a  type  of  consistency  result.  Unfortunately,  moat  pro¬ 
cedures  are  consistent  in  the  above  sense  and  thus  this  concept 
does  not  help  one  choose  amon’  strategies. 

£5.  The  Case  of  One  Known  and  One  Unknown  Probability  of  Succeaa 
In  this  section  we  examine  in  detail  the  situation  where 
F(p,0 r)  «  F(f)Q(ff)  with  0(<r)  =»  Ia.  In  other  words,  the  distribu¬ 
tions  are  Independent  with  the  probability  of  succeaa  of  machine 
II  known  to  be  or.  Let  n  trials  be  allowed  and  F  be  the  initial 
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•  priori  distribution  of  succsss  for  Mschlns  I.  Define  K^F) 
by  the  condition  that  If  <r>  K^F),  the  optimal  procedure  Is 
to  uae  the  known  machine  for  the  first  step,  and  if  O' <  K^fF), 
then  machine  I  is  the  optimal  choloe  while  if  <r-  K^F)  either 
choice  is  optimal.  We  adopt  the  convention  that  If  at  any  trial 
it  is  optimal  to  uae  either  machine,  then  In  that  case  we  choose  I. 
We  seek  to  determine  the  form  of  K^(F)  which  represents  the  deci¬ 
sion  function. 

The  optimal  procedure  then  is  given  as  follows: 

If  cr  <  K^P),  then  at  the  first  step  one  uses  the  machine 

of  unknown  probability  of  success.  On  the  other  hand,  if 

a  >  K^fP),  one  usea  the  known  machine.  After  the  first  step, 

depending  on  what  happened,  we  compute  the  new  a  posteriori 

distributions  1^  and  F *  (p)  and  compare  cr  and  K^^fF' )  following 

the  above  rules  as  to  what  to  do  at  the  second  stage,  etc. 

We  now  establish  a  series  of  lemmas  describing  the  form  of 
the  optimal  strategy. 

Lemma  4.  If  the  known  machine  II  Is  employed  at  any  trial 
according  to  an  optimal  strategy,  then  It  Is  used  thereafter. 

Proof:  If  the  optimal  procedure  uses  II  r  times  (r  <  n) 
and  then  I,  the  expected  value  Is 

(6)  wr+  E(P)  ♦  E(F)Y(F8,  n-n-1)  +  [l  -  E(F)]  Y(Ff,  n-r-1 ) 

tfiere  E(F)  is  the  expected  value  of  the  distribution  F;  F3  Is  an 
a  posteriori  distribution  given  success  has  occurred  on  I;  Ff 
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corresponds  to  the  case  where  failure  happened  on  I  and  Y(F,  n) 
is  the  optimal  expected  yield  when  the  a  priori  distribution 
is  F  and  n  trials  remain.  The  strategy  using  I  first  followed 
by  r  tries  on  II  and  then  optimal  continuation  gives  the  same 
yield  as  in  (6).  Thus  by  our  convention  the  optimal  procedure 
calls  for  use  of  I  first. 

Lemma  5.  For  any  distribution  F  and  n  >  2, 

^(P)  <  Kjj(F)  . 

Proof:  In  fact,  suppose  the  contrary  and  that  or  is  such 
that  Kr>_1  ( F )  >  cr  >  K^F).  Consequently,  the  optimal  procedure 
begins  with  machine  II  and  then  must  play  I  at  the  second  trial. 
This  contradicts  lemma  4. 

Lemma  6.  For  any  distribution  F  and  any  n 
Y(F8,  n)  >  Y(Ff,  n) 

where  Y(F,  n)  represents  the  expected  yield  following  an  optimal 
policy  for  n  moves  when  F  is  the  given  a  priori  distribution  ofp. 

Proof:  llie  proof  is  by  induction  on  n.  If  K^F3)  >  cr  >  K^( F1* ) 
then  Y(F3,  n)  >  no”  >  Y(Ff,  n)  from  which  we  conclude  that  the 
lemma  is  valid.  If  cr  >  K^F3)  and  cr  >  Kh(Ff),  then  Y(F3,  n)  - 

f* 

Y(F  ,  n)  »  nor  ,  and  again  lemma  6  is  true.  Thus  suppose  both 
K^F8),  Kn(Ff)  >  <T  ,  then 
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Y(F3,  n)  -  E(F3)  ♦  E(FS)Y(P33,  n-1) 

♦  L1  -  E(P3)]Y(F3f,  n-1)  -  E(F8)  ♦  AR 

while 

Y(Ff,  n)  -  E(Ff)  +  E(Ff)Y(Ff3,  n-1 ) 

+  [l  -  E(Ff)jY(Fff,  n-1)  *  E(Ff)  +  Bn  . 

The  Induction  hypothesis  shows  that 

Y(F38,  n-1)  >  Y(F3f,  n-1)  -  Y(Ff3,  n-1 )  >  Y(Fff,  n-1 ) 

and  thus  any  convex  combination  of  the  first  two  terms  is  larger 
than  or  equal  to  any  convex  combination  of  the  last  two  terms. 
This  yields  An  >  Bn,  but  evidently  E(F8)  >  E(F)  >  E(F^)  and  hence 

Y(F3,  n)  >  Y(Ff,  n) 

Lemma  7.  For  any  distribution  F,  we  have 
Kn(P8)  >  K^(Ff )  . 

Proof:  (By  contradiction)  Suppose  o-  is  such  that 
^(F8)  <  cr  <  K^(Ff)  . 


We  secure 
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n<r  <  Y(Ff,  n)  <  Y(Fa,  n) 

by  lemma  6.  This  contradicts  the  fact  that  no-  la  the  optimal 
yield  when  the  a  priori  distribution  la  Fa. 

Lemma  8.  If  success  occurs  on  either  machine  while  following 
an  optimal  procedure,  then  the  same  machine  la  employed  at  the 
next  trial. 

Proof:  It  has  been  shown  by  lemma  4  that  If  the  unknown 
machine  is  ever  used,  then  one  never  departs  from  it  according 
to  an  optimal  procedure.  To  complete  the  proof.  It  remains  to 
show  that  If  success  occurs  on  I,  then  one  chooses  thla  same 
machine  the  next  time.  It  Is  clearly  sufficient  to  show  this 
for  the  first  two  trials.  Suppose  the  lemma  Is  false,  that  I 
Is  used,  a  success  occurs  and  one  switches  to  II.  Thus 
cr  >  Kn_i ( F3 )  >  E(F3 )  >  E(F)  by  lemma  5.  By  lemma  7,  also 
( F^ )  <  o'.  Consequently, 

no  <  Y(F,  n)  -  E(F)  ♦  (n-l)o- 

and  thus  E(F)  >  cr  which  contradicts  the  above  Inequality. 

Another  property  valid  for  this  model  is  contained  in 
lemma  9. 

Lemma  9.  The  a  priori  expected  value  for  each  stage  Is  non-decreasing 


when  pursuing  an  optimal  strategy. 
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Proof:  It  is  sufficient  to  show  this  for  the  first  two 
steps,  then  II  is  used  st  the  first  step,  the  result  is  trivisl 
in  view  of  leans  4.  On  the  other  hand,  if  I  is  used,  then  the 
expected  value  for  the  first  step  is  £(P).  If  one  continues  with 
I  regardless  of  the  outcome  of  the  first  trial,  then  the  a  priori 
expected  value  is  again  E(P)  for  the  second  stage,  which  substan¬ 
tiates  the  conclusion  of  the  lemma. 

It  remains  only  to  consider  the  case  where  the  second  trial 
depends  on  the  result  of  the  first  trial.  On  account  of  lemma  b, 
if  success  occurred  first,  then  I  is  again  chosen.  Suppose  a 
failure  occurs  and  the  optimal  strategy  calls  for  a  switch,  t)ien 

f*  f 

*>  *WP  >  >  E(F  ).  Consequently,  the  expected  value  at  the 
second  stage  is 

^  L^j  +  (1-^)0-  >  ^2  +  (Wi)~r-t^  =  h. 

where  are  the  moments  about  zero  of  F. 

As  we  have  seen  in  §2,  lemma  9  13  not  always  true. 

The  above  lemmas  enable  us  to  describe  completely  the  optimal 
strategy.  To  determine  the  explicit  value  of  K^F),  we  assume 
that  o- m  K^F).  It  is  clear  in  view  of  lemma  b  that  the  optimal 
strategy  has  the  following  form  for  appropriate  k^  (defined  below). 

(A)  At  the  first  step  choose  I  and  stay  with  it  until  a 
failure  occurs. 

(B)  There  exists  an  integer  Ic^  >  0  such  that  if  at  least 
k^  successes  have  occurred  before  the  one  failure,  then  proceed 
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with  I.  Otherwise,  If  less  then  successes  occur  before  the 
failure,  then  change  to  II  from  there  on. 

(C)  A  corresponding  Integer  kg  is  attached  to  two  failures, 
l.e.,  if  two  failures  have  resulted  and  less  than  ♦  kg  suc¬ 
cesses,  then  switch  to  II;  otherwise  continue  with  I. 

( D )  Generally,  if  r  failures  and  3  successes  have  occurred 

where  k^  +  •••  +  k  <  s  <  +  kg  +  •••  kr,  then  change  to  II; 

otherwise  continue  with  I. 

The  yield  due  to  the  strategy  prescribed  above  can  be 
collected  in  the  following  way:  All  the  terms  with  no  failure 
have  the  form 

i0  - ;  <<• +  <»2  +  •••  <•”)<«'  • 


All  the  terms  with  one  failure  for  machine  I  combine  to  yield 
according  to  the  choice  of  k^  the  value 

Ix  *  I^tr)  -  /  [(})f  +  (2)f2  +  •••  +  (n"^_1  If''""'01-1] ( 1-P)d?< f) 


+  < T 


£/£(n-l)  +  ( n— 2 )  p  +  •••  +  (n-k j(l-f)dF(f  )J . 


Analogously,  tne  contributions  corresponding  to  two  failures  for 
machine  I  give  the  quantity 
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(i-^zdp(f) 


+  0-/  J(n-tt1-2)(j)+(n-k1-3)(*)p+-"(n-k1-k2-l)(12)f  2  Jr^U-f)2 dP(p) 
The  terns  involving  exactly  r  failures  for  aschine  I  yield 

(7 1  vV'i-/  e  < )+  •— < z:1  {< )+-•-+<  ) >  •  •  * 

*  _a  1  a'  2  2  ' 


I^*l~a2*'**r-lC  a  -1  k  ,  -♦■a  —1  \[b  b  +1 

,?o  ((*r-)+-(  LI  >)K)r(brr^-( 


b  +n-  Z  k.  — r— 1  n-  Z  k.- 

V!  *  “ 1 


^♦kg* 

f>  (l-p)rdF(f) 


l*-2 


+  <r 


1  rarl  k_+a,-l  A  1-1  C  a  ,-l  k.+a,,  ,-1  \ 

/  zo{(.;_i)— < a2j  )}e-  z:  >;■ 

a,  -0 v  1  1  '  a  ,  »0  r— 1  r— 1 

l  r— 1 


‘^5  ki-r-!)(  £  )f+  • 


cr+l 


r  c  +k  k  -1 

•  (n-  £  ki-r)(  ;  >  r 

1-1  r 


] 


r— 1 
Z  k, 
1-1 


(1-p)3 
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where  b  »  r  -  a^.  -  a2  -  •  •  •  -  ar,  cr  •  r  -  1  —  •  •  •  -  ®p_2  * 

and  we  interpret  (^)  *  0  for  c  /  -  1  with  (_J)  ■  1.  Our  objective 
is  fulfilled  in  the  following  theorem. 


Theorem  2. 


Kn<F) 


sup 


I0+Il+I2+'“+Ir+* 


sup 


kl*k2*’*’  Io+Il+12+"  *+Ir't'*  *  ’  ki»k2,*‘* 


R3 


where  J 


[q  +  1^^  +  I2  +  •••  and  Where  Ir  is.  obtained  frog  Ir  by 


replacing  dF(^>)  by  and  the  are  subject  to  the  reatrictlons 


0  <  k^  <  n  —  1 


r-1 


,  0<k2<n-2-  kj,  0<kr<n-r-2  k1#  • 

.jt- 1  1=1 


with  the  understanding,  for  instance,  that  if  n  —  X  —  2  k.  =•  0 

—  T  vV...i^ - 1 

~i - r 


then 


J  Io+Il+***^-l 

The  proof  conalata  in  showing  that 


no  -  I^o)  -  I2(o-)  -  I?(o)-**  -  c  [Iq  +  1^  +  I2  +  •••]  . 


The  general  formula  is  established  by  a  long  Induction  argument  and 
we  shall  illustrate  the  method  of  proof  by  considering  only  the 
first  few  sums.  The  general  proof  can  be  established  by  an  exten¬ 
sion  of  the  argument  used.  The  basic  identity  used  extensively 
in  the  proof  is 
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(8)  n  -  (n-1)  /  (l-p)dF(p)  -  (tv-2)  /p( l-p)dF - -  /^(l-^dFlp) 

m  J  (  l+pf^+«  •  •  +/P  ^  )dF  . 

This  can  be  verified  directly  by  a  simple  induction.  As  an 
Immediate  consequence  of  (8),  we  obtain 

k.-l 

(9)  n  —  (n— 1)  /  (l-f)dF  -  (n-2)  /p(l-p)dF - -  (n-k^  /f  1  (l-p)dF 

k,  u  — 1 

-  (n— k^ )  Jf1  dF  +  /  (1+pf*  #,+p  )dF  . 


Using  (9)*  *•  secure 


(10)  no--  Ix  (a)  -  (l+p+* •  *  +pkl  )dF  +  (n-4^)  /p  1  dFj 


Repeated  application  of  (9)  gives 


(11)  cfn-l^)  /p^dF  -  1 2(<r)  -  crfkgln-^-kg-l)  Jp1  2(l-p)dF(p) 

k.  +k-  k,  k.  +1  k.  +k9— 1 

♦  (n— k^— kg)  JfP-  2dF(p)  +  /  (p  1+p'L  +  ***+f’1  )dF 

+  /fNl-p)  [  (J)  +  (*)p  +  •"  +  (  i2)f  2  ]dF(p)  J  . 


To  describe  one  more  step  in  the  process  we  find  again  by  using  (9) 
several  times  that 
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(12)  ajk2(n— l^-kg-l)/^1  2(l-p)dF  +  (n-l^-kgj/f^^dpj  -  I^(c) 

-  oj( n-J^-kg-k^ )  /f*1  +k2+k3dp  +  (kg+kjJtn-kj-kg-kj-lJ/p^^^Jl-^dP 

+  [kgk3+(J)+(2)+ •••  t"5)]  (n-k1-k2-k3-2)/fkl+k2+k3(l-^2dP 

♦  /  (^...^^JdF  ♦  /(l-p)^+k2[(kf )  ♦  (kf  ^ 

♦  **‘+  (  21  3)f^  ]d?  +  /  k2[(J)  +  (?)^----*'(i5)fk3  ^P^^U-P)2*? 
+  /  [(2)  +  (lv+  *"+  ^2  1)fC3  3]pkl+k2fr-p)8dF  . 

The  pattern  ia  now  clear  that  on  combining  (10),  (11),  (12)  and 
continuing  In  the  same  manner  we  find  that  crj'  >  J  or  cr  a  i.  , 


Hence  K  (F)  «  aup  4  . 

kl  J 

Some  special  caae3  are  worth  noting: 


lif)  -  fW>dpty>  , 

/(l+p)dF(f)  I0 
In  Io+Ii 

K,(F)  -  max  (-?  ,  4-4) 

*0  I0+Il 


-  max 


(  Jh»W)< IF  (  J(jH-gQ2)dF(»)  x 
V/(l+p+^f)dF  /(l+2p)dF(f)  ' 


Unfortunately,  both  terms  In  Itj(F 


then 


13 

2? 


)  can  occur;  e.g.,  If  F(p)  »  p, 
while  If  F(p)  =  p1^  ,  then 


/(l+2f)^'4/'5<lP 


I- 


In  general,  the  expression  for  K^F)  in  Theorem  2  can  not  be  simplified 
in  any  .way  and  represents  the  simplest  form  for  the  decision  function 
available  which  again  testifies  to  the  complex  nature  of  optimal 
strategies  in  such  sequential  design  problems. 

For  practical  purposes  a  reasonable  approximation  to  K^F) 
can  be  obtained  by  choosing  k^  »  n  —  1.  In  that  case,  one  compares 
or  with 


Ln<P> 


I  (p+^  +•  *  )dF 

/( l+p+*  •  • )dF 


This  applies  wall  for  n  small  (n  <  10).  In  the  case  of  F(p)  »p, 
for  example,  then  K^tp)  -  Ln(p)  when  n  <  4,  but  they  cease  to  be 
equal  for  n  =  5  and  on.  It  is  worth  noting  that  Ln(F)  shares  many 
of  the  properties  of  K^F)  . 


Lemma  10.  L  ( F )  is  monotonic  increasing  in  n  for  any  F  and 
■  --  ■  n  ’ 

Ln(pS)  >  Ln(F)  ^  Ln(pf)- 

Proof  j •  The  proof  of  lemma  10  is  based  on  the  well-known  result 
a  a.+*  •  •■♦■a 

that  if  is  increasing  with  ar,  br  >  0,  then  b  13  al3° 

Dr  1  n 
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increasing.  Since  <U|  <  V  w  **  virtue  of  Holder's  inequmlity 

If**  -  I?™ 

we  obtain  that  L  ( F )  is  monotone  increasing  as  a  consequence  of  the 
above  cited  result.  The  latter  part  of  the  lemma  can  be  proved 
readily  using  this  same  result. 


Theorem  3.  If  F  ha3  the  property  that  /  dF(f) 
Moreover,  the  known  machine  II  i_s  never  used  if 


cr  <  wvr 


oo,  then  K^( F ) 


1. 


Proof : 


1 


>  > 


JU+-  ‘  _ 

;(iY+...+^-A)dF 


>  i  -e 


for  n  sufficiently  large.  The  right-hand  side  of  the  second  part  of 
Theorem  3  is  the  expected  value  at  trial  n  if  n  -  1  tries  on  machine  I 
produced  all  failures.  The  last  assertion  follows  since  K^F )  >  E(F) 
for  any  F  and  any  n. 

The  interpretation  of  Theorem  3  is  the  intuitive  fact  that  if 
there  exists  substantial  positive  probability  of  success  and 
if  the  number  of  trials  is  sufficiently  large,  the  unknown  machine 
should  be  played  first  unless  a  =  1. 


§4.  Certain  Tame  Aspects  of  the  General  Problem. 

The  first  type  of  game  problem  we  consider  in  this  section  is 
a3  follows:  Let  §N(F(^,cr),3)  denote  the  expected  value  obtained  when 
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p(^,  °)  la  *he  a  priori  distribution  for  the  probabilities  of  successes 
f>  and  a  of  machines  I  and  II  respectively,  and  S  defines  a  strategy. 

The  number  N  denotes  the  fixed  number  of  trials  to  be  used  through¬ 
out  these  game  considerations.  We  therefore  drop  the  subscript  N. 

The  function  §(F,  S)  is  evaluated  as  follows:  The  a  priori 
distribution  is  given  first  and  the  policy  3  is  a  procedure  in  terms 
of  inequalities  involving  the  complete  first  n  moments  of  the  distri¬ 
bution  and  in  P(f>,  c).  (See  the  beginning  of  ^2.) 

Theorem  4.  if  $(F,  s)  i£  evaluated  as  indicated  above,  then 

min  max  $(F,  S)  »  max  min  I>(F,  S)  *  max(Na,  N3)  where  the  class 

F  S  S  F 

of  distributions  la  restricted  by  the  condition  JpdF(p,  O')  =  a 
and  /  odF(^>,a)  -  p.  An  optimal  minlmax  distribution  is  F  =  I  ^ 

(the  distribution  concentrating  fully  at  (a,  3))  while  is  an 
optimal  minlmax  policy. 

Proof:  If  one  considers  the  distribution  Ia  then  regard¬ 
less  of  the  strategy  S  employed,  an  upper  bound  for  the  yield  is 
max  (Na,  N3).  This  is  evident  since  after  every  performance,  the 
a  posteriori  distribution  is  unchanged  and  equal  to  Iu  It  repre¬ 
sents  the  only  distribution  where  the  information  is  complete  and 
no  experimentation  contributes  any  value.  On  the  other  hand,  if 
the  statistician  employs  policy  SQ,  then  by  virtue  of  the  conditions 
on  the  moments  of  F  the  yield  at  the  first  step  is  max(a,  3). 

Lemma  3  Implies  that  $(F,  sc)  >  n  max  (a»  &)  for  any  F  of  the 
type  examined.  The  proof  of  Theorem  4  is  hereby  complete. 
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Thus,  the  Intuitive  strategy  Sc  does  assume  a  certain  general 
significance  on  account  of  Theorem  4.  We  remark  that  there  exist 
many  other  optimal  minimax  policies  aside  from  5Q. 

Another  type  of  game  can  be  introduced  where  decisions  S  are 
not  functions  of  an  a  priori  distribution  but  depend  only  on  the 
observed  number  of  successes  and  failures  to  that  point.  'Hie 
expected  value  $(F,  S)  is  evaluated  in  terms  of  F  and  the  game  is 
considered  where  F  is  restricted  by  JpdF  »  a  and  /  OdF  ■  The 
analysis  of  this  game  remains  an  open  question. 
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